Model Selection

FP8 Quantization

# FP8 Quantization

Qwen2.5 VL 32B Instruct FP8 Dynamic

An FP8 quantized version based on the Qwen2.5-VL-32B-Instruct model, supporting visual-text input and text output, suitable for efficient inference scenarios.

Transformers English

Qwen3 235B A22B FP8 Dynamic

The FP8 quantized version of the Qwen3-235B-A22B model, which effectively reduces GPU memory requirements and improves computational throughput, suitable for various natural language processing scenarios.

Large Language Model

Qwen3 30B A3B FP8

Qwen3 is the latest generation of large language models in the Tongyi Qianwen series, offering a complete suite of dense and mixture-of-experts (MoE) models. Through large-scale training, Qwen3 has achieved breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.

Large Language Model

Gemma 3 27b It FP8 Dynamic

This is a quantized version of google/gemma-3-27b-it. The weights are quantized using the FP8 data type. It is suitable for visual-text input and text output, and can perform inference with efficient deployment using vLLM.

Transformers English

Qwen3-0.6B-FP8 is the latest version in the Tongyi Qianwen series of large language models, offering a 0.6B-parameter FP8 quantized version that supports free switching of mind modes and multilingual tasks.

Large Language Model

FLUX.1 Dev ControlNet Union Pro 2.0 Fp8

This is the FP8 quantized version of the Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0 model, quantized from the original BFloat16 format using PyTorch's native FP8 support to optimize inference performance.

Image Generation English

Qwen2.5 VL 7B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM

Transformers English

Qwen2.5 VL 3B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.

Transformers English

Deepseek R1 Distill Llama 70B FP8 Dynamic

The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.

Large Language Model

Bamba-9B is a decoder-only language model based on the Mamba-2 architecture, trained in two stages, excelling in a wide range of text generation tasks.

Large Language Model

ibm-ai-platform

Pixtral 12b FP8 Dynamic

pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.

Safetensors Supports Multiple Languages

Llama 3.2 11B Vision Instruct FP8 Dynamic

This is a quantized model based on Llama-3.2-11B-Vision-Instruct, suitable for multilingual business and research purposes, and can be used in chat scenarios similar to assistants.

Safetensors Supports Multiple Languages

Deepseek Coder V2 Lite Instruct FP8

FP8 quantized version of DeepSeek-Coder-V2-Lite-Instruct, suitable for commercial and research use in English, optimized for inference efficiency.

Large Language Model

Meta Llama 3 70B Instruct FP8

Meta-Llama-3-70B-Instruct-FP8 is a quantized version of Meta-Llama-3-70B-Instruct. It reduces disk size and GPU memory requirements through FP8 quantization while maintaining high performance. It is suitable for English business and research purposes.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase